Skip to content

Conversation

@delock
Copy link
Collaborator

@delock delock commented Oct 1, 2025

This PR add a blog/lab for study of zenflow and zero offload performance with DeepSpeed CPU core binding.

@delock delock requested a review from Copilot October 1, 2025 15:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive blog/lab study examining ZenFlow technology and its performance improvements with DeepSpeed CPU core binding. The study investigates how CPU core binding affects both ZeRO Offload and ZenFlow performance, documenting specific optimizations and measurement results.

  • Documents ZenFlow technology and its relationship with DeepSpeed CPU core binding
  • Presents performance testing results comparing different core binding strategies
  • Introduces improvements to ZenFlow's core binding mechanism developed in collaboration with ZenFlow authors

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@delock delock force-pushed the gma/zenflow_binding_study branch 2 times, most recently from a8a1be5 to 92d4d21 Compare October 1, 2025 15:26
@sfc-gh-truwase
Copy link
Collaborator

@delock thanks for the PR. Can you please modify https://github.com/deepspeedai/DeepSpeed/blob/master/docs/index.md in order to update the Latest News section of the home page?

delock and others added 19 commits October 3, 2025 11:23
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
This PR adds a blog post for SuperOffload. More specifically, the blog
covers the design and motivation behind SuperOffload, comparisons with
previous approaches, key experiences and insights, and guidance on
enabling and using SuperOffload.

See also:
[PR#7559](#7559) -
SuperOffload implementation.
[PR#990](deepspeedai/DeepSpeedExamples#990) -
Examples.

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Since `make format` will generate `venv` directory, we should add it to
`.gitignore`.

Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
This PR improves state management for DeepCompile in the engine.

Previously, the system relied only on the config flag indicating whether
DeepCompile was enabled. However, DeepCompile is actually activated only
when `compile()` is called. This meant that if DeepCompile was enabled
in the config but `compile()` was never called, it could lead to invalid
internal states (as shown in #7598).

Since `enabled == True` should be interpreted as an option that modifies
the behavior of `compile()`, this PR introduces clearer state
management:
- If .compile() is not called, the DeepCompile config has no effect on
behavior. A one-time message is shown instead.
- A new state, DeepCompile activated, is introduced. This represents the
condition where DeepCompile is both enabled in the config and .compile()
has been called.

---------

Signed-off-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
PR #6993 replaces the flat IPG buffers with a dict maintaining
type-indexed buckets. The member is also renamed from
`_ipg_bucket_flat_buffer` to `ipg_buckets`.

Update the bucket clearing logic in `init_z3` accordingly.

Signed-off-by: Junjie Mao <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Polish SuperOffload blog post; minor grammar and style fixes

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
… when world size expansion. (#7599)

When the world size expands from 2 to 4, then convert to universal
checkpoint, and load from universal checkpoint.
The new rank, for example, rank3 will load model file
`zero_pp_rank_3_mp_rank_00_model_states.pt`. But this file was not
produced during the last execution.
For stage3, just load the first file, that is
`zero_pp_rank_0_mp_rank_00_model_states`.
The existing unit test
TestZeROUniversalCheckpointDP::test_dp_world_size_2to4 can verify this
problem.

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
delock and others added 5 commits October 3, 2025 11:23
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
@delock delock force-pushed the gma/zenflow_binding_study branch from 891454d to b21cb82 Compare October 3, 2025 03:24
@delock delock requested a review from tohtana as a code owner October 3, 2025 03:24
@delock
Copy link
Collaborator Author

delock commented Oct 3, 2025

Hi @sfc-gh-truwase the link from index page had been added, along with other wording fixes. Thanks for the suggestions!

@sfc-gh-truwase sfc-gh-truwase merged commit 2b68bbc into master Oct 6, 2025
2 checks passed
@sfc-gh-truwase sfc-gh-truwase deleted the gma/zenflow_binding_study branch October 6, 2025 15:38
Liangliang-Ma pushed a commit to Liangliang-Ma/DeepSpeed that referenced this pull request Oct 13, 2025
This PR add a blog/lab for study of zenflow and zero offload performance
with DeepSpeed CPU core binding.

---------

Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
Signed-off-by: Junjie Mao <[email protected]>
Co-authored-by: Xinyu Lian <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: zhengchenyu <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Junjie Mao <[email protected]>
Signed-off-by: Ma, Liangliang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants